Doc: Update Files metadata table #3422

szehon-ho · 2021-10-29T21:53:51Z

Files metadata table was missing some columns, and also missing an example for partitioned tables in which the new partition column is added (users may want to know how to check what data files they have for a given partition)

kbendick · 2021-10-29T22:19:18Z

cc @samredai as the docs refactor is underway 👍

rdblue · 2021-10-31T16:46:52Z

site/docs/spark-queries.md

-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET     | 1            | 597                | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0]  | []               | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null         | [4]           |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet | PARQUET     | 1            | 597                | [1 -> 90, 2 -> 62] | [1 -> 1, 2 -> 1] | [1 -> 0, 2 -> 0]  | []               | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null         | [4]           |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+------------------+-----------------+-----------------+--------------+---------------+
+-------+-------------------------------------------------------------------------+-----------+---------------+------------+------------------+---------------------------+------------------------+------------------------+----------------+---------------------------------------+---------------------------------------+------------+-------------+------------+-------------+


Is it possible to not revert the formatting changes? I think it is less readable with the initial space removed.

Maybe we should replace this with a real HTML table?

Here's a snippet of code that we use in our notebooks to format PySpark dataframes as nicer tables:

from prettytable import PrettyTable from IPython.core.magic import register_line_cell_magic class DFTable(PrettyTable): def __repr__(self): return self.get_string() def _repr_html_(self): return self.get_html_string() def to_table(df, num_rows=100): cols = df.columns t = DFTable() t.field_names = cols t.align = "r" for row in df.limit(num_rows).collect(): d = row.asDict() t.add_row([ d[col] for col in cols ]) return t

That will produce both HTML and text tables that have reasonable formatting.

Makes sense, let me look at this.

kbendick

Somewhat unrelated, but I noticed that lower_bounds has no value for some keys. Is this possibly confusing for readers of the docs?

[1 -> , 2 -> c]

Notice that 1 doesn't point to anything. Is this intentional and do we think this might confuse readers? I can open a separate issue if so.

rdblue · 2021-11-01T16:17:48Z

@kbendick, I think that we should convert lower and upper bounds into human-readable strings. Right now, I think we pass them to Spark as a map of id to binary.

szehon-ho · 2021-11-05T22:05:35Z

@rdblue @kbendick @samredai i made a markdown table for files, if you guys click 'View File' in github it should show what it looks like. If you want i can extend this to the other ones too, or do it separately.

By the way i also found a possible problem in the spec_id column added in #3015 , it should probably get hidden just like partition column if table is unpartitioned, I could fix that in another PR.

samredai · 2021-11-06T04:44:53Z

GitHub injects some overflow css to add the scroll bar, if we switch to markdown tables we'd have to add that as well. Here's how the table currently would look:

But adding the following:

extra.css

.markdown-table-container {
  width: 780px;
  overflow-x: scroll;
}

and then surrounding your markdown table with:

<div class="markdown-table-container" markdown="block">
...markdown table here...
</div>

will make it look like this with the scrollbar:

rdblue · 2021-11-07T19:10:35Z

+1 to adding the scroll bar. Thanks @samredai!

rdblue · 2021-11-07T19:12:06Z

The markdown table looks good to me. The only problem is that now just this table is markdown. Anyone want to follow up with an update for the other tables?

KnightChess · 2021-11-08T02:26:35Z

The markdown table looks good to me. The only problem is that now just this table is markdown. Anyone want to follow up with an update for the other tables?

@rdblue I can update other tables in this pr #3482

Files metadata table was missing some columns, and also missing an example for partitioned tables (users may want to know how to check what data files they have for a given partition)

szehon-ho · 2021-11-09T05:24:48Z

Thanks @samredai and @rdblue for review , added the scroll bar using div and css as suggested. Attached screenshot

KnightChess · 2021-11-10T01:44:57Z

use scroll bar, there will be a empty line under the form when there is not overflow. did I not use it or optimize the scroll bar @szehon-ho

szehon-ho · 2021-11-10T04:27:17Z

@KnightChess I'm not sure to be honest, can you put up your change on your pr? We can continue discussion on that pr instead of this one?

samredai · 2021-11-10T13:55:27Z

@szehon-ho can you update the CSS to the below? If we're using this for all markdown tables, auto will hide the scrollbar space that @KnightChess is seeing when the table width is below the container width.

.markdown-table-container {
  width: 780px;
  overflow-x: auto;
}

szehon-ho · 2021-11-10T18:00:15Z

Done, thanks @KnightChess for finding the issue that would occur for the other table, and @samredai for figuring it out

szehon-ho · 2021-12-14T22:47:50Z

Hey @samredai @rdblue , would we able to continue moving this forward? Was out on vacation, but my understanding is that we still keep the markdown files in this repo?

samredai · 2021-12-14T23:14:14Z

@szehon-ho yes the plan is to keep the markdown files in this repo. There are some changes coming soon but any merged docs PR will be included. This looks good to me, I'll let @rdblue comment on if it's good to merge.

rdblue · 2021-12-14T23:33:07Z

Merged. Thanks, @szehon-ho!

szehon-ho · 2021-12-15T00:29:26Z

Thanks for fast response !

* apache/iceberg#3723 * apache/iceberg#3732 * apache/iceberg#3749 * apache/iceberg#3766 * apache/iceberg#3787 * apache/iceberg#3796 * apache/iceberg#3809 * apache/iceberg#3820 * apache/iceberg#3878 * apache/iceberg#3890 * apache/iceberg#3892 * apache/iceberg#3944 * apache/iceberg#3976 * apache/iceberg#3993 * apache/iceberg#3996 * apache/iceberg#4008 * apache/iceberg#3758 and 3856 * apache/iceberg#3761 * apache/iceberg#2062 * apache/iceberg#3422 * remove restriction related to legacy parquet file list

github-actions bot added the docs label Oct 29, 2021

rdblue reviewed Oct 31, 2021

View reviewed changes

kbendick reviewed Oct 31, 2021

View reviewed changes

rdblue mentioned this pull request Nov 7, 2021

spark: supplement spark-queries.md doc #3482

Merged

szehon-ho and others added 3 commits November 8, 2021 21:19

Doc: Update Files metadata table

1f97b80

Files metadata table was missing some columns, and also missing an example for partitioned tables (users may want to know how to check what data files they have for a given partition)

Update spark-queries.md

576419b

Add scroll bar

24daf88

szehon-ho force-pushed the patch-5 branch from 72573ed to 24daf88 Compare November 9, 2021 05:21

Update css to overflow-x:auto

081fa7c

rdblue approved these changes Dec 14, 2021

View reviewed changes

rdblue merged commit 40ae303 into apache:master Dec 14, 2021

jackye1995 pushed a commit to jackye1995/iceberg-docs that referenced this pull request Feb 8, 2022

https://github.com/apache/iceberg/pull/3422

7719819

Doc: Update Files metadata table #3422

Doc: Update Files metadata table #3422

Uh oh!

Conversation

szehon-ho commented Oct 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbendick commented Oct 29, 2021

Uh oh!

rdblue Oct 31, 2021

Choose a reason for hiding this comment

Uh oh!

szehon-ho Nov 1, 2021

Choose a reason for hiding this comment

Uh oh!

kbendick left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented Nov 1, 2021

Uh oh!

szehon-ho commented Nov 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samredai commented Nov 6, 2021

extra.css

Uh oh!

rdblue commented Nov 7, 2021

Uh oh!

rdblue commented Nov 7, 2021

Uh oh!

KnightChess commented Nov 8, 2021

Uh oh!

szehon-ho commented Nov 9, 2021

Uh oh!

KnightChess commented Nov 10, 2021

Uh oh!

szehon-ho commented Nov 10, 2021

Uh oh!

samredai commented Nov 10, 2021

Uh oh!

szehon-ho commented Nov 10, 2021

Uh oh!

szehon-ho commented Dec 14, 2021

Uh oh!

samredai commented Dec 14, 2021

Uh oh!

rdblue commented Dec 14, 2021

Uh oh!

szehon-ho commented Dec 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

szehon-ho commented Oct 29, 2021 •

edited

Loading

szehon-ho commented Nov 5, 2021 •

edited

Loading